Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

feat: add completed/incompleted flag in memdisk cache, unlock storage_manager #50

Merged
merged 20 commits into from
Apr 29, 2024

Conversation

lanlou1554
Copy link
Collaborator

@lanlou1554 lanlou1554 commented Apr 27, 2024

The main goal is: during disk write, we don't want to hold any lock.

  1. add completed/incompleted flag in memdisk cache
  2. unlock disk_manager
  3. unlock storage_manager via fine-grained locks
  4. now our service can handle multiple same requests simultaneously without conflicts
  5. add pin/unpin to replacer to pass the test of test_storage_manager_parallel_2, otherwise, the lock could still have concurrency issue because one thread may evict another thread's in-progress data, and after putting, get will get nothing : ( This issue could be common for mem_cache
  6. put tests in server.rs together, otherwise it will have error thread 'server::tests::test_download_file' panicked at server.rs:217:27: error binding to 127.0.0.1:3030: error creating server listener: Address already in use (os error 48)

Solve #27, #57 partially solve #25

TODO1: should add a github issue to write data to network in put_data_to_cache, but this must change current pin/unpin strategy, since currently, the corrrectness relies on every put followed by get.

TODO2: actually in lru.rs, if the front entry is pinned, we should find the next element to evict. Current implementation only have huge problem when it is disk_cache, since the get_data will fail. But I think this scenario is rare, so I keep the current implementation for simplicity. But we need add a github issue.

TODO3 (important): put the unpin from get_data_from_cache from mod.rs to memory.rs & disk.rs, at the end of tokio::spawn, but it needs a lot refactor, so github issue is also needed.

@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2024

Codecov Report

Attention: Patch coverage is 86.20690% with 88 lines in your changes are missing coverage. Please review.

Project coverage is 87.71%. Comparing base (94b59cc) to head (e7f1349).

Files Patch % Lines
...age-node/src/cache/data_store_cache/memdisk/mod.rs 78.26% 30 Missing and 30 partials ⚠️
storage-node/src/storage_manager.rs 90.47% 1 Missing and 17 partials ⚠️
storage-node/src/cache/replacer/lru_k.rs 93.75% 3 Missing and 1 partial ⚠️
.../cache/data_store_cache/memdisk/data_store/disk.rs 84.61% 0 Missing and 2 partials ⚠️
storage-node/src/cache/replacer/lru.rs 97.26% 2 Missing ⚠️
...ache/data_store_cache/memdisk/data_store/memory.rs 92.85% 0 Missing and 1 partial ⚠️
storage-node/src/disk/disk_manager.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main      #50    +/-   ##
========================================
  Coverage   87.71%   87.71%            
========================================
  Files          20       20            
  Lines        2539     3077   +538     
  Branches     2539     3077   +538     
========================================
+ Hits         2227     2699   +472     
- Misses        181      207    +26     
- Partials      131      171    +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lanlou1554 lanlou1554 marked this pull request as draft April 28, 2024 05:12
@lanlou1554 lanlou1554 marked this pull request as ready for review April 28, 2024 05:49
storage-node/src/cache/replacer/lru.rs Outdated Show resolved Hide resolved
@@ -10,7 +12,8 @@ use super::ReplacerValue;
/// objects. The replacer will start evicting if a new object comes that makes
/// the replacer's size exceeds its max capacity, from the oldest to the newest.
pub struct LruReplacer<K: ReplacerKey, V: ReplacerValue> {
cache_map: LinkedHashMap<K, V>,
// usize is pin count
cache_map: LinkedHashMap<K, (V, usize)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make pin_count a field of ReplacerValue since it'll be used by all replacers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! but maybe a github issue and do it later since this pr is too large : (

storage-node/src/cache/replacer/mod.rs Show resolved Hide resolved
storage-node/src/storage_manager.rs Outdated Show resolved Hide resolved
@xx01cyx
Copy link
Member

xx01cyx commented Apr 29, 2024

Maybe implement sth like this would introduce less complexity and make the code easier to write & maintain?

https://github.com/kaimast/tokio-condvar/blob/main/src/lib.rs

@lanlou1554
Copy link
Collaborator Author

Maybe implement sth like this would introduce less complexity and make the code easier to write & maintain?

https://github.com/kaimast/tokio-condvar/blob/main/src/lib.rs

Cool! (I never knew a lock can be dropped : (. Updated, PTAL at the latest commit!

Comment on lines +214 to +216
let notified = notify.0.notified();
drop(status_of_keys);
notified.await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to have a customized CondVar (sth. like this). But we can keep it as is and refactor it later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can later add a github issue to record all the refactor actions?

Comment on lines 350 to 359
let mut status_of_keys = self.status_of_keys.write().await;
let ((status, size), notify) = status_of_keys.get_mut(&remote_location).unwrap();
*status = Status::Completed;
*size = bytes_mem_written;
{
let mut mem_replacer = self.mem_replacer.as_ref().unwrap().lock().await;
mem_replacer.pin(&remote_location, *notify.1.lock().await);
}
notify.0.notify_waiters();
return Ok(bytes_mem_written);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we adopt the same logic to ensure the atomicity here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut status_of_keys = self.status_of_keys.write().await; has already ensure the atomicity I guess lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I want to move notify.0.notify_waiters(); out of the scope of status_of_keys write lock, so waking threads can immediately access the lock?

Comment on lines 398 to 406
let ((status, size), notify) = status_of_keys.get_mut(&remote_location).unwrap();
*status = Status::Completed;
*size = data_size;
self.disk_replacer
.lock()
.await
.pin(&remote_location, *notify.1.lock().await);
// FIXME: disk_replacer lock should be released here
notify.0.notify_waiters();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also here?

debug!("-------- Evicting Key: {:?} --------", key);
evicted_keys.push(key);
self.size -= cache_value.size();
self.size -= cache_value.0.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider constructing another struct to replace this .0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can later add a github issue to record all the refactor actions : (

@lanlou1554 lanlou1554 merged commit c994a44 into main Apr 29, 2024
1 check passed
@lanlou1554 lanlou1554 deleted the ll/unlock_disk_manager branch April 29, 2024 21:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants